Building Cooling Energy Data Analysis in Singapore

Prepared by: Clayton Miller ETHZ

April 22, 2013

Introduction

The goal of this notebook is to introduce a dataset of 2 years worth of time series measurements from an international school campus in Singapore. This data is part of a research project focused on utilizing data analytics to reduce the energy impact of buildings. Data analysis is meant to distill the various raw data sources sometimes found in building controls and energy management systems.

First, a basic introduction to the domain of building cooling systems: Below, in Figure 1, is a simplified diagram of a building cooling system common to large commercial buildings. The total system has many machines working as a system to deliver cold air to each of the zones in the building. There are two key subsystems:

  • Waterside system - the chilled water plant which produces cold water and distributes it throughout the campus; this water is used to cool air at the Air Handling Units (AHUs)

  • Airside system - the components after the chilled water coil in the air handling unit all the way to the zone variable air volume (VAV) box

In [10]:
#Figure 1
from IPython.core.display import Image 
Image(filename='/Users/millerc/Dropbox/03-ETH/91-UWC/KaggleConsulting/Simplified Cooling System Diagram.jpg')
Out[10]:

The building that will be focused on in this study is the United World College Campus in Singapore. It is a 70,000 sq.meter International School in which construction was completed in June 2011 and students first utilized the facility in September of that year. It is a certified Greemark Platinum building which means it is at the top of building ratings in terms of energy efficiency and sustainability.

The facility is extraordinary from a data analysis standpoint as it has a few extra sensors meansuring key energy information and that it is trending and storing much of the data generated by the building controls system. This is much different than a conventional commercial building which hardly ever stores any of the data from its controls systems. The main reason behind this is that operations staff find that too much raw data is unuseful and confusing.

UWC is different in that they would like to store and manage as much data as possible and look to researchers to help then get value from this repository. We will focus on UWC's cooling system as it is approximately 65% of all the energy consumed on campus as year-round cooling and dehumidification are required in Singapore's climate. UWC spends approximately SGD $75,000 (40,000 Euros)/month on cooling energy consumption!

Figure 2 is a diagram of the UWC campus that points out the main 'Blocks' or buildings on campus:

Please note that the Education Block 1 has been renamed to Blocks B, C, and D and Education Block 2 has been renamed to Block A

In [11]:
#Figure 2 - Campus Map
Image(filename='/Users/millerc/Dropbox/03-ETH/91-UWC/KaggleConsulting/campusmap.jpeg')
Out[11]:

In terms of the cooling system components, Figure 3 below shows a diagram of the Chilled Water Plant, Weather Stations, AHU's, and VAV's as they exist in the system. The diagram also includes the types of granular data available from each piece of equipment.

In [12]:
#Figure 3 - Cooling System Diagram for UWC
Image(filename='/Users/millerc/Dropbox/03-ETH/91-UWC/KaggleConsulting/Cooling System Equipment Heirarchy.jpg')
Out[12]:

Objectives of the Analysis

In this section, I will give an idea of what I'd like to do with the data. Figure 4 below gives an overview of the data sources and an importance of each in terms of the goals. We can go into more discussion about these goals on our introduction call.

In [13]:
#Figure 4 - Goals Overview
Image(filename='/Users/millerc/Dropbox/03-ETH/91-UWC/KaggleConsulting/KaggleGoals.png')
Out[13]:

The Data

Finally, I will go through and describe the datasets and show a quick visualization of each.

Data can be found at: https://www.dropbox.com/sh/kap9dr21idkk51x/wQYgnMie6E

In [14]:
cd ~/Dropbox/03-ETH/91-UWC/KaggleConsulting/Data/
/Users/millerc/Dropbox/03-ETH/91-UWC/KaggleConsulting/Data

In [15]:
ls
Airside.csv
ChillerPlantEquip.csv
ChillerPlantSystem.csv
PrimarySystemMetrics.csv
UWC Cooling System Point List MetaData - UWCPointList.csv
VAVZoneAirflow.csv
VAVZoneTemp.csv

Primary System Metrics Data

The highest level chilled water plant performance data. Almost completely gapless 2 years of 1 min interval data.

In [16]:
import pandas as pd
PrimarySystemMetrics = pd.read_csv('PrimarySystemMetrics.csv', parse_dates=True, index_col='0_1')
In [17]:
PrimarySystemMetrics
Out[17]:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1032453 entries, 2011-04-15 00:00:00 to 2013-03-31 23:59:00
Data columns:
ChillerPlantSystem_kW                     1032453  non-null values
ChillerPlantSystem_TonsOfCoolingEnergy    1032453  non-null values
ChillerPlantSystem_Efficiency             892325  non-null values
dtypes: float64(3)

Resample to hourly and plot only 2013 so far:

In [18]:
PrimarySystemMetrics.resample('H').truncate(before='2013-02-01').plot(subplots=True, figsize=(25,30))
Out[18]:
array([Axes(0.125,0.694118;0.775x0.205882),
       Axes(0.125,0.447059;0.775x0.205882), Axes(0.125,0.2;0.775x0.205882)], dtype=object)

Chiller Plant Data

This is supplemental data for the chilled water system as a whole. This data (and the rest of the granular data) is collected by a different system which is much less reliable therefore there are many more gaps in the datasets.

In [19]:
ChillerPlantSystemData = pd.read_csv('ChillerPlantSystem.csv', parse_dates=True, index_col='timestamp')
In [20]:
ChillerPlantSystemData
Out[20]:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 80887 entries, 2011-06-01 00:00:00 to 2013-04-04 01:10:00
Data columns:
CP_ChillerPlantRiserMainHeader_CondenserWaterReturnTemp    79980  non-null values
CP_ChillerPlantRiserMainHeader_CondenserWaterSupplyTemp    79981  non-null values
CP_ChillerPlantSystem_Efficiency                           66986  non-null values
CP_ChillerPlantSystem_kW                                   79981  non-null values
CP_TotalChilledWaterPump_Efficiency                        79982  non-null values
CP_TotalChilledWaterPump_kW                                79981  non-null values
CP_TotalChiller_Efficiency                                 79980  non-null values
CP_TotalChiller_kW                                         79982  non-null values
CP_TotalCondenserWaterPump_Efficiency                      79980  non-null values
CP_TotalCondenserWaterPump_kW                              79981  non-null values
CP_TotalCoolingTowerSystem_Efficiency                      79980  non-null values
CP_TotalCoolingTowerSystem_kW                              79981  non-null values
CP_WeatherStation1_OutdoorAirHumidity                      79980  non-null values
CP_WeatherStation1_OutdoorAirWetbulbTemp                   79980  non-null values
CP_WeatherStation1_OutdoorTemp                             79980  non-null values
CP_WeatherStation2_OutdoorAirHumidity                      79981  non-null values
CP_WeatherStation2_OutdoorAirWetbulbTemp                   79982  non-null values
CP_WeatherStation2_OutdoorTemp                             79980  non-null values
dtypes: float64(18)
In [21]:
ChillerPlantSystemData.resample('H').truncate(before='2013-01-01').plot(subplots=True, figsize=(25,30))
Out[21]:
array([Axes(0.125,0.86729;0.775x0.0327103),
       Axes(0.125,0.828037;0.775x0.0327103),
       Axes(0.125,0.788785;0.775x0.0327103),
       Axes(0.125,0.749533;0.775x0.0327103),
       Axes(0.125,0.71028;0.775x0.0327103),
       Axes(0.125,0.671028;0.775x0.0327103),
       Axes(0.125,0.631776;0.775x0.0327103),
       Axes(0.125,0.592523;0.775x0.0327103),
       Axes(0.125,0.553271;0.775x0.0327103),
       Axes(0.125,0.514019;0.775x0.0327103),
       Axes(0.125,0.474766;0.775x0.0327103),
       Axes(0.125,0.435514;0.775x0.0327103),
       Axes(0.125,0.396262;0.775x0.0327103),
       Axes(0.125,0.357009;0.775x0.0327103),
       Axes(0.125,0.317757;0.775x0.0327103),
       Axes(0.125,0.278505;0.775x0.0327103),
       Axes(0.125,0.239252;0.775x0.0327103),
       Axes(0.125,0.2;0.775x0.0327103)], dtype=object)

2013 Data is good but 2011-2012 has a few gaps:

In [22]:
ChillerPlantSystemData.resample('H').truncate(after='2013-01-01').plot(subplots=True, figsize=(25,30))
Out[22]:
array([Axes(0.125,0.86729;0.775x0.0327103),
       Axes(0.125,0.828037;0.775x0.0327103),
       Axes(0.125,0.788785;0.775x0.0327103),
       Axes(0.125,0.749533;0.775x0.0327103),
       Axes(0.125,0.71028;0.775x0.0327103),
       Axes(0.125,0.671028;0.775x0.0327103),
       Axes(0.125,0.631776;0.775x0.0327103),
       Axes(0.125,0.592523;0.775x0.0327103),
       Axes(0.125,0.553271;0.775x0.0327103),
       Axes(0.125,0.514019;0.775x0.0327103),
       Axes(0.125,0.474766;0.775x0.0327103),
       Axes(0.125,0.435514;0.775x0.0327103),
       Axes(0.125,0.396262;0.775x0.0327103),
       Axes(0.125,0.357009;0.775x0.0327103),
       Axes(0.125,0.317757;0.775x0.0327103),
       Axes(0.125,0.278505;0.775x0.0327103),
       Axes(0.125,0.239252;0.775x0.0327103),
       Axes(0.125,0.2;0.775x0.0327103)], dtype=object)

There is more detailed Chiller Plant Equipment data which gives more details on the specific pieces of equipment:

In [23]:
ChillerEquipData = pd.read_csv('ChillerPlantEquip.csv', index_col='timestamp', parse_dates=True)
In [24]:
ChillerEquipData
Out[24]:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 80850 entries, 2011-06-01 00:00:00 to 2013-04-04 01:10:00
Data columns:
CP_Chiller1_ChilledWaterFlowrate             79957  non-null values
CP_Chiller1_ChilledWaterReturnTemp           79956  non-null values
CP_Chiller1_ChilledWaterSupplyTemp           79955  non-null values
CP_Chiller1_CondenserWaterFlowrate           79957  non-null values
CP_Chiller1_CondenserWaterReturnTemp         79957  non-null values
CP_Chiller1_CondenserWaterSupplyTemp         79956  non-null values
CP_Chiller1_CoolingTons                      79957  non-null values
CP_Chiller1_Efficiency                       79959  non-null values
CP_Chiller1_kW                               79956  non-null values
CP_Chiller2_ChilledWaterFlowrate             79957  non-null values
CP_Chiller2_ChilledWaterReturnTemp           79957  non-null values
CP_Chiller2_ChilledWaterSupplyTemp           79956  non-null values
CP_Chiller2_CondenserWaterFlowrate           79958  non-null values
CP_Chiller2_CondenserWaterReturnTemp         79958  non-null values
CP_Chiller2_CondenserWaterSupplyTemp         79956  non-null values
CP_Chiller2_CoolingTons                      42559  non-null values
CP_Chiller2_Efficiency                       79958  non-null values
CP_Chiller2_kW                               79958  non-null values
CP_Chiller3_ChilledWaterFlowrate             79958  non-null values
CP_Chiller3_ChilledWaterReturnTemp           79958  non-null values
CP_Chiller3_ChilledWaterSupplyTemp           79957  non-null values
CP_Chiller3_CondenserWaterFlowrate           79958  non-null values
CP_Chiller3_CondenserWaterReturnTemp         79958  non-null values
CP_Chiller3_CondenserWaterSupplyTemp         79958  non-null values
CP_Chiller3_CoolingTons                      79959  non-null values
CP_Chiller3_Efficiency                       79958  non-null values
CP_Chiller3_kW                               79957  non-null values
CP_CoolingTower1_CondenserWaterSupplyTemp    79957  non-null values
CP_CoolingTower2_CondenserWaterSupplyTemp    79958  non-null values
CP_CoolingTower3_CondenserWaterSupplyTemp    79922  non-null values
CP_CoolingTower4_CondenserWaterSupplyTemp    79958  non-null values
CP_CoolingTower5_CondenserWaterSupplyTemp    79958  non-null values
CP_CoolingTower6_CondenserWaterSupplyTemp    79956  non-null values
CP_CoolingTower7_CondenserWaterSupplyTemp    79957  non-null values
CP_CoolingTower8_CondenserWaterSupplyTemp    79958  non-null values
CP_CoolingTower9_CondenserWaterSupplyTemp    79918  non-null values
dtypes: float64(36)

Airside Data

The Air Handling Unit data can serve as an indicator fo system operation and load profiling. There is a lot of good data from the airside to indicate what % of the building is occupied at any given point in time. This is important as occupancy has a significant factor on consumption.

In [25]:
Airsidedata = pd.read_csv('Airside.csv', index_col='timestamp', parse_dates=True)
In [26]:
Airsidedata
Out[26]:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 440970 entries, 2011-06-01 00:00:00 to 2013-04-04 06:40:00
Columns: 228 entries, BA_AHU_1_1_CO2Sensor to SC_AHU_MPH_SupplyAirTemp
dtypes: float64(228)

You'll notice there is quite a few more pieces of equipment and therefore points in this dataset. When we plot, it looks like a mess:

In [27]:
Airsidedata.resample('D').plot(figsize=(25,30),legend=False)
Out[27]:
<matplotlib.axes.AxesSubplot at 0x118f11f50>

Due to this amount of data, we'll use a 'mapping set' to help us navigate the data:

In [28]:
DataPointMapping = pd.read_csv('UWC Cooling System Point List MetaData - UWCPointList.csv')
In [29]:
DataPointMapping.head()
Out[29]:
PointName LocationTag EquipTag PointTag Category Class
0 BA_AHU_1_1_CO2Sensor BA AHU 1-1 CO2Sensor Airside Analog
1 BA_AHU_1_1_Humidity BA AHU 1-1 Humidity Airside Analog
2 BA_AHU_1_1_OffcoilTemp BA AHU 1-1 OffcoilTemp Airside Analog
3 BA_AHU_1_1_OnOffStatus BA AHU 1-1 OnOffStatus Airside Binary
4 BA_AHU_1_1_ReturnAirTemp BA AHU 1-1 ReturnAirTemp Airside Analog

Let's use the mapping set to query our Airside dataframe:

In [30]:
CO2SensorData = Airsidedata[(DataPointMapping.PointName[(DataPointMapping.PointTag == 'CO2Sensor')])]
CO2SensorData
Out[30]:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 440970 entries, 2011-06-01 00:00:00 to 2013-04-04 06:40:00
Data columns:
BA_AHU_1_1_CO2Sensor         113221  non-null values
BA_AHU_1_2_CO2Sensor         113272  non-null values
BA_AHU_2_1_CO2Sensor         113270  non-null values
BA_AHU_3_1_CO2Sensor         113267  non-null values
BA_AHU_5_1A_CO2Sensor        11764  non-null values
BA_AHU_5_1B_CO2Sensor        113220  non-null values
BA_AHU_5_1C_CO2Sensor        113270  non-null values
BA_AHU_5_1D_CO2Sensor        113281  non-null values
BB_AHU_1_1_CO2Sensor         119854  non-null values
BB_AHU_1_2_CO2Sensor         119854  non-null values
BB_AHU_1_3_CO2Sensor         119830  non-null values
BB_AHU_EXAM_CO2Sensor        119836  non-null values
BD_AHU_2_1_CO2Sensor         118934  non-null values
SC_AHU_1_CO2Sensor           76574  non-null values
SC_AHU_2_CO2Sensor           67548  non-null values
SC_AHU_3_CO2Sensor           65495  non-null values
SC_AHU_4_CO2Sensor           76572  non-null values
SC_AHU_BLACKBOX_CO2Sensor    76636  non-null values
SC_AHU_MPH_CO2Sensor         113135  non-null values
dtypes: float64(19)
In [31]:
CO2SensorData.resample('D').plot(figsize=(25,15),ylim=(0,1000))
Out[31]:
<matplotlib.axes.AxesSubplot at 0x10ec4d590>

VAV Data

The VAV data is essentially the room conditions in each of the air conditioned zones. This is essentially the thermostat you see on the wall in each room and the amount of air coming in through the diffuser in the ceilling. I think this information is quite valuable in terms of understanding performance from a thermal comfort perspective.

In [32]:
VAVZoneTemp = pd.read_csv('VAVZoneTemp.csv',index_col='timestamp',parse_dates=True)
In [33]:
VAVZoneTemp
Out[33]:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 354085 entries, 2011-06-01 00:00:00 to 2013-04-04 06:40:00
Columns: 338 entries, BA_VAV_01_01_01_ZoneTemp to IB_VAV_03_04_09_ZoneTemp
dtypes: float64(338)
In [34]:
VAVZoneTemp.resample('D').plot(figsize=(30,25),legend=False, ylim=(15,35))
Out[34]:
<matplotlib.axes.AxesSubplot at 0x10fa08a10>
In [34]: